Statistics is a scientific discipline that understands phenomena through data collection and analysis. In real life, we often cannot survey every individual, so we use 'sampling' to infer the whole from a part and achieve scientific inference.
1. Core Terms in Statistical Surveys
- Census (Complete Survey): A method that surveys every individual object.
- Sampling Survey: Selecting a portion of individuals from the population for investigation and using this as a basis to estimate and infer the overall situation.
- Population: The entire set of survey subjects.
- Individual: Each survey subject that makes up the population.
- Sample: The portion of individuals drawn from the population.
- Sample size: The number of individuals included in the sample.
2. Multiple Methods of Data Acquisition
In addition to directly obtaining data viasurveysuch as census, we can also obtain data through:
- Experiment: In statistics, the study of designing experiments is called 'experimental design'.
- Observation: Collecting information in natural conditions.
- Query: Obtaining data previously collected by others; such data is known assecondary data.
Samples are random, so statistical inferences made from samples to estimate populations carrya degree of uncertainty(i.e., there may be errors), which should be noted when interpreting real-world problems using statistical results.
Proportion Formula: $\frac{n}{N} = \frac{\text{Layer Sample Size}}{\text{Total Size of Each Layer}}$
1. Gather polynomial terms: one x² square, three x rectangular strips, and two 1×1 unit squares.
2. Begin geometrically assembling them.
3. They perfectly form a larger continuous rectangle! Width is (x+2), height is (x+1).
QUESTION 1
To understand the scores of 5,000 students taking a computer proficiency test in a certain area, 200 students were randomly selected for investigation and analysis. In this context, the 200 selected students are ( ).
A. Population
B. Individual
C. Sample
D. Sample size
Correct! The population is the scores of the 5,000 students, and the scores of the 200 selected students constitute a sample.
Incorrect. The 200 students are a subset of the population, i.e., the sample. The sample size refers to the numerical value 200.
QUESTION 2
A company has $N$ employees divided into several departments. To conduct a stratified random sampling with proportional allocation, a sample of size $n$ is to be drawn from all employees. If a department has $m$ employees, how many employees should be sampled from this department? ( )
$\frac{m}{n} \cdot N$
$\frac{n}{N} \cdot m$
$\frac{m}{N} \cdot m$
$n - m$
Correct! According to the principle of proportional allocation in stratified random sampling, the sampling ratio is $\frac{n}{N}$, so the number to be sampled from this department is $m \times \frac{n}{N}$.
Incorrect. Stratified random sampling must maintain consistent sampling ratios within each layer compared to the overall sampling ratio, i.e., $\frac{\text{Layer Sample Size}}{m} = \frac{n}{N}$.
QUESTION 3
Which of the following surveys is most suitable for using a sampling survey? ( )
Surveying the grain sowing area across villages in a county
Understanding the germination rate of a batch of corn seeds
An enterprise surveying employee health check-up records
A full vision census of students in a class
Correct! Determining the germination rate of corn seeds is destructive, making a complete survey impossible, so sampling is essential.
Incorrect. If a survey is destructive (e.g., seed germination rate, light bulb lifespan) or involves a very large population, a sampling survey should be chosen.
QUESTION 4
A public health department in a region surveyed 200 students about smoking habits, with 58 answering 'yes.' Can you estimate the percentage of smokers among students in this area?
29%
58%
20%
Cannot be estimated
Correct! Estimate the population percentage using the sample frequency: $58 \div 200 = 0.29 = 29\%$.
Incorrect. Use the sample frequency divided by the sample size to get the frequency, then use this to estimate the population proportion.
QUESTION 5
The main difference between simple random sampling and stratified random sampling lies in ( ).
Different sample sizes
Whether each individual has an equal probability of being selected
Whether sampling is conducted by grouping based on individual differences
Completely different data processing methods
Correct! Stratified random sampling is suitable for populations with significant internal differences, reducing sampling error through stratification.
Note: In both methods, each individual has an equal probability of being selected. The difference lies in stratified sampling utilizing auxiliary information from the population (hierarchical differences).
QUESTION 6
For $m$ data points $x_i$ with mean $\bar{x}$, and $n$ data points $y_j$ with mean $\bar{y}$, the correct formula for the combined overall mean is ( ).
$\frac{\bar{x} + \bar{y}}{2}$
$\frac{m\bar{x} + n\bar{y}}{m+n}$
$\frac{\bar{x} + \bar{y}}{m+n}$
$\frac{m+n}{\bar{x} + \bar{y}}$
Correct! This reflects the concept of weighted averages and is the core formula for estimating the overall mean in stratified sampling.
Incorrect. You cannot simply add the means and divide by 2; you must consider the sample size (weight) of each group.
QUESTION 7
Regarding the 'uncertainty' in sampling surveys, which statement is correct? ( )
As long as the method is scientific, the conclusion is absolute truth
The results of a sampling survey have no reference value
The conclusion is inferred from the sample and carries a risk of randomness
A census result can also produce uncertain errors
Correct! Statistical inference results carry uncertainty because sample selection is random.
Incorrect. Uncertainty is an inherent attribute of statistics, meaning results have a probabilistic nature rather than being inevitable.
QUESTION 8
Which of the following survey methods belongs to acquiring 'secondary data'? ( )
Measuring students’ 100-meter times directly during physical education classes
Consulting population data in the Statistical Yearbook at the library
Designing questionnaires to survey passersby’s consumption habits
Recording reaction times through chemical experiments
Correct! Consulting data already collected and organized by others constitutes acquiring secondary data.
Incorrect. Secondary data refers to data not obtained directly by the investigator through original observation or experimentation.
QUESTION 9
In stratified random sampling, if the total population size is 1000, the sample size is 100, and a certain stratum has 250 individuals, how many individuals should be sampled from this stratum? ( )
10
25
50
100
Correct! The sampling ratio is $100/1000 = 0.1$, so this stratum should sample $250 \times 0.1 = 25$ individuals.
Incorrect. Use the proportion formula: Layer sample size = (Sample size / Total population size) × Layer population size.
QUESTION 10
In simple random sampling, the probability that each individual is selected is ( ).
1
$n/N$
$1/n$
$1/N$
Correct! In simple random sampling with a sample size of $n$ and a population size of $N$, the probability that each individual is selected is $n/N$.
Incorrect. Although it is random sampling, the probability of each individual being selected depends on the ratio between sample size and population size.
Challenge: Statistical Plan Design and Inference
Reading Material:The municipal government plans to adopt a tiered electricity pricing system, determining standards based on sampled data from 200 households (range 50–350 kWh). The goal is to have 75% of residents in the first tier, 20% in the second tier, and the remaining 5% in the third tier.
1. [Short Answer] Prove the formula for the overall mean in stratified sampling: $\frac{\sum_{i=1}^m x_i + \sum_{j=1}^n y_j}{m+n} = \frac{m}{m+n}\bar{x} + \frac{n}{m+n}\bar{y}$
Proof: By definition of the mean, $\sum_{i=1}^m x_i = m\bar{x}$ and $\sum_{j=1}^n y_j = n\bar{y}$.
Substitute into the left-hand side numerator:
Left side $= \frac{m\bar{x} + n\bar{y}}{m+n} = \frac{m\bar{x}}{m+n} + \frac{n\bar{y}}{m+n} = \frac{m}{m+n}\bar{x} + \frac{n}{m+n}\bar{y}$.
Proved. This formula shows that the overall mean is a weighted average of the layer means.
Substitute into the left-hand side numerator:
Left side $= \frac{m\bar{x} + n\bar{y}}{m+n} = \frac{m\bar{x}}{m+n} + \frac{n\bar{y}}{m+n} = \frac{m}{m+n}\bar{x} + \frac{n}{m+n}\bar{y}$.
Proved. This formula shows that the overall mean is a weighted average of the layer means.
2. [Writing Task] Design a plan for a 'school-wide student body weight survey' (approximately 500 words).
Key Points of Reference Plan:
1. Clarify Objectives: Understand the average weight and obesity rate distribution among all students.
2. Define Population and Individuals: All students in the school constitute the population, with each student as an individual.
3. Choose Sampling Method: Considering significant developmental differences across grades and gender, it is recommended to usestratified random sampling. Use grade (Grade 10, 11, 12) and gender as stratification criteria.
4. Determine Sample Size: Based on manpower costs, select 10% of students (e.g., 300 people).
5. Implement Data Collection: Use direct measurement (weight scale recording), not self-reporting (secondary data may introduce bias).
6. Analysis and Inference: Calculate sample mean and standard deviation, draw a frequency distribution histogram, and define the 'overweight' standard based on percentiles.
1. Clarify Objectives: Understand the average weight and obesity rate distribution among all students.
2. Define Population and Individuals: All students in the school constitute the population, with each student as an individual.
3. Choose Sampling Method: Considering significant developmental differences across grades and gender, it is recommended to usestratified random sampling. Use grade (Grade 10, 11, 12) and gender as stratification criteria.
4. Determine Sample Size: Based on manpower costs, select 10% of students (e.g., 300 people).
5. Implement Data Collection: Use direct measurement (weight scale recording), not self-reporting (secondary data may introduce bias).
6. Analysis and Inference: Calculate sample mean and standard deviation, draw a frequency distribution histogram, and define the 'overweight' standard based on percentiles.
3. [Short Answer] Someone says: 'Sampling surveys save manpower and resources and yield similar results to censuses, so sampling is always preferable.' Do you think this view is reasonable?
Reference Answer:
This view has some merit but is overly absolute.
(1) Advantages: Sampling surveys indeed offer economic efficiency and timeliness, and are the only option when dealing with destructive tests (e.g., seed germination trials) or infinite populations.
(2) Limitations: Sampling surveys involve sampling errors, and conclusions carry 'uncertainty.' For scenarios requiring extremely high precision, involving major national decisions (e.g., census), or legally mandated full coverage, censuses remain irreplaceable.
(3) Conclusion: Selection should be flexible based on survey purpose, cost, and population size.
This view has some merit but is overly absolute.
(1) Advantages: Sampling surveys indeed offer economic efficiency and timeliness, and are the only option when dealing with destructive tests (e.g., seed germination trials) or infinite populations.
(2) Limitations: Sampling surveys involve sampling errors, and conclusions carry 'uncertainty.' For scenarios requiring extremely high precision, involving major national decisions (e.g., census), or legally mandated full coverage, censuses remain irreplaceable.
(3) Conclusion: Selection should be flexible based on survey purpose, cost, and population size.
✨ Key Takeaways
Population and individualsare clearly distinguished,,random samplingensures fairness,.stratified proportionsmust be accurate,,sample estimationcarries uncertainty,!
💡 Key Point of Stratification
The core of stratified sampling lies in small variation within layers and large variation between layers.
💡 Note on Sample Size
The larger the sample size $n$, the smaller the sampling error typically is, but the higher the cost.
💡 Census vs Sampling
Destructive experiments (e.g., light bulb lifespan, grain germination rate) must never use comprehensive surveys.
💡 Data Cleaning
After obtaining secondary data, verify the source's authority and timeliness, and perform necessary data cleaning.
💡 Understanding Uncertainty
The estimated smoking rate of 29% derived from sampling is just an estimate and does not mean the population is definitely 29%.